Scalable Nonparametric Multiway Data Analysis

نویسندگان

  • Shandian Zhe
  • Zenglin Xu
  • Xinqi Chu
  • Yuan Qi
  • Youngja Park
چکیده

Multiway data analysis deals with multiway arrays, i.e., tensors, and the goal is twofold: predicting missing entries by modeling the interactions between array elements and discovering hidden patterns, such as clusters or communities in each mode. Despite the success of existing tensor factorization approaches, they are either unable to capture nonlinear interactions, or computationally expensive to handle massive data. In addition, most of the existing methods lack a principled way to discover latent clusters, which is important for better understanding of the data. To address these issues, we propose a scalable nonparametric tensor decomposition model. It employs Dirichlet process mixture (DPM) prior to model the latent clusters; it uses local Gaussian processes (GPs) to capture nonlinear relationships and to improve scalability. An efficient online variational Bayes Expectation-Maximization algorithm is proposed to learn the model. Experiments on both synthetic and real-world data show that the proposed model is able to discover latent clusters with higher prediction accuracy than competitive methods. Furthermore, the proposed model obtains significantly better predictive performance than the state-of-the-art large scale tensor decomposition algorithm, GigaTensor, on two large datasets with billions of entries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis

Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches—such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)—amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g.missing data and binary data), and (iii) noisy observations and ...

متن کامل

Scalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors

We present a scalable Bayesian framework for low-rank decomposition of multiway tensor data with missing observations. The key issue of pre-specifying the rank of the decomposition is sidestepped in a principled manner using a multiplicative gamma process prior. Both continuous and binary data can be analyzed under the framework, in a coherent way using fully conjugate Bayesian analysis. In par...

متن کامل

InfTucker: t-Process based Infinite Tensor Decomposition

Tensor decomposition is a powerful tool for multiway data analysis. Many popular tensor decomposition approaches—such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)—conduct multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g. missing data and binary data), and (iii) noisy observations and outliers. To ad...

متن کامل

Multiway Regularized Generalized Canonical Correlation Analysis

Regularized Generalized Canonical Correlation Analysis (RGCCA) is currently geared for the analysis two-way data matrix. In this paper, multiway RGCCA (MGCCA) extends RGCCA to the multiway data configuration. More specifically, MGCCA aims at studying the complex relationships between a set of three-way data table.

متن کامل

CLAM: Connection-less, Lightweight, and Multiway Communication Support for Distributed Computing

A number of factors motivate and favor the implementation of communication protocols in user-space. There is a particularly strong motivation for the provision of scalable, multiway and connectionless transport for distributed computing, multimedia, and conferencing applications. This is also true of high speed networking, where it is bene-cial to keep the OS kernel out of the critical path in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015